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A  Computer  Processible  Nomenclature 
For  Gene  Symbols 

Allan  L,  Shaffon,'  Raymond  L.  Clark,"  Willis  H.  Skrdla,^  and  Kanti  Rawal' 


Introduction 

Since  1947,  when  the  North  Central  Regional 
Plant  Introduction  Station  was  established  in  Ames, 
Iowa,  the  genetic  resources  available  there  have 
been  cataloged  by  hand.  A  seed  list  of  available 
tomato  lines  was  produced  by  typing,  proofread¬ 
ing,  and  correcting  stencils  and  by  subsequent 
mimeographing  and  assembling  of  copies.  Distri¬ 
bution  of  these  mimeographed  seed  lists  repre¬ 
sented  the  final  step  in  disseminating  information 
about  these  genetic  resources. 

This  mode  of  disseminating  information  now  is 
inadequate.  Each  year,  after  completing  the  an¬ 
nual  accessions  evaluation,  the  seed  list  required 
updating.  This  entailed  integrating  the  new  data 
and  subsequently  revising  the  seed  list.  Further, 
many  cooperating  scientists  maintained  files  of 
hand-  or  type-written  reports  in  various  formats, 
and  these  data  often  were  not  included  in  the 
seed  lists. 

Manual  procedures  also  proved  inadequate  in 
processing  requests  for  tomatoes  with  specific  com¬ 
binations  of  useful  characteristics.  These  requests 
were  handled  by  visually  scanning  the  seed  list 
and  noting  those  accessions  with  the  desired  char¬ 
acter  combinations.  When  the  collection  was  small 
(less  than  1,000  lines)  and  only  two  traits  were 
considered,  selecting  desired  accessions  could  be 
accomplished  easily. 

Now,  however,  the  tomato  collection  contains 
more  than  4,000  lines,  and  we  frequently  receive 
requests  for  material  with  several  traits.  With 
manual  techniques,  these  requests  are  difficult  to 
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handle.  As  a  result,  much  useful  data  and  many 
useful  genetic  resources  are  not  adequately  uti¬ 
lized. 

Information  Storage  and  Retrieval 

Computer-assisted  methods  have  proved  useful 
in  facilitating  the  use  and  exchange  of  informa¬ 
tion.  Among  the  various  computer  tools  is  an  in¬ 
formation  storage  and  retrieval  (IR)  system  that 
simplifies  extracting  a  subset  from  the  informa¬ 
tion  in  a  data  collection.  Individual  IR  systems 
have  unique  features,  but  generally,  they  allow 
the  user  to  obtain  selected  information  from  the 
entire  data  collection.  When  plant  genetic  re¬ 
sources  are  considered,  selection  criteria  typically 
are  combinations  of  plant  traits. 

To  process  data  with  a  computerized  IR  system, 
several  preliminary  steps  are  required;  (a)  The 
user  must  decide  which  variables  (characters)  are 
to  be  considered,  (b)  a  meaningful  scheme  for  rep¬ 
resenting  values  for  these  variables  must  be  deter¬ 
mined,  (c)  data  must  be  prepared  in  a  machine- 
readable  form,  and  (d)  the  machine-readable  data 
must  be  processed  by  the  IR  system.  Many  IR  sys¬ 
tems  keep  stored  data  in  an  internal  format,  which 
is  called  a  data  bank.  Once  prepared,  the  data 
bank  is  easily  interrogated  and  updated. 

Existing  data  on  a  World  Collection  of  the  Genus 
Lycopersicon  raised  an  interesting  point.  Not  only 
were  identification  and  morphological,  pathologi¬ 
cal,  and  horticultural  observations  recorded,  but 
also  data  from  gene  analysis.  For  storing  genetic 
data,  gene  symbols  must  be  represented  in  a  com- 
puter-processible  nomenclature. 

Gene  Symbol  Nomenclature 

Gene  symbolization  is  based  on  an  established 
system,  but  considerable  variation  exists  in  pub¬ 
lished  nomenclature  (2,  4,  5,  6,  7).  Communication 
among  breeders  and  geneticists  could  be  enhanced 
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greatly  by  using  a  standard  system  for  nomen¬ 
clature  (7).  The  Tomato  Genetics  Cooperative  (TGC) 
has  published  rules  for  nomenclature  in  tomato 
genetics  (2,  3,  4),  and  we  used  these  as  the  basis 
for  the  computer-processible  nomenclature  for  gene 
symbols  described  in  this  paper.  Table  1  contains 
the  nomenclatorial  rules  set  forth  by  the  Gene  List 
Committee  of  the  TGC  (2,  4). 

Computing  machines  have  certain  limitations 
regarding  printable  characters  or  symbols;  there¬ 
fore,  we  made  minor  modifications  in  the  TGC 
genetic  nomenclature  to  produce  a  computer- 
processible  nomenclature.  The  suggested  changes 
in  the  nomenclature  are  listed  in  table  2.  In  sum¬ 
mary,  these  changes  reflect  the  availability  of 
superscripts  and  italics.  Modified  notation  for  su- 
prescripts  include  two  forms:  "|"  and  "[".  The  sym¬ 
bol  "j"  has  been  used  in  this  paper.  The  modifica¬ 
tion  concerning  parentheses  is  not  a  general  com¬ 
puting  limitation  but  is  imposed  by  the  IR  system 
that  we  used. 

One  additional  nomenclatorial  rule,  not  explic¬ 
itly  discussed  in  the  current  TGC  rules,  concerns 
mimics,  which  makes  the  use  of  subscripts  unneces¬ 
sary.  As  shown  in  subsequent  examples,  however, 
cross-referencing  obsolete  symbols  may  be  neces¬ 
sary.  In  such  cases,  we  used  the  symbol  " — "  or 
"  I"  to  indicate  that  a  subscript  follows.  Thus, 
becomes  d^2  or  d  \2.  The  two  alternative  forms 
of  notation  reflect  machine  differences. 

To  demonstrate  the  applications  of  computer- 
processible  notation  for  gene  symbols,  we  con¬ 
structed  two  data  banks  by  using  the  Executive 
Information  Retrieval  System  (EXIR).  One  contains 
information  extracted  from  the  list  of  tomato  genes 
published  by  the  TGC  (3),  and  the  second  contains 
information  on  accessions  in  the  World  Collection 
of  the  Genus  Lycopersicon  (5).  Only  subsets  of 
available  data  have  been  usod  to  limit  the  output 
volume  in  the  examples. 

The  term  "descriptor"  is  defined  as  a  variable 
or  attribute  that  is  observed  on  a  set  of  elements 
in  a  data  bank.  The  values  for  a  descriptor  are 
referred  to  as  "descriptor  states."  Each  data  bank 
will  be  discussed  individually. 

Tomato  Gene  List 

Descriptors  included  in  the  gene  list  data  bank 
reflect  the  format  of  the  report  of  the  Gene  List 
Committee  in  TGC  Report  No.  21  (3),  and  they 
demonstrate  the  effectiveness  of  an  IR  system  for 
cross-referencing.  The  descriptors  SYMBOL  and 


NAME  follow  the  rules  for  nomenclature  and  iden¬ 
tify  the  mutant  gene  under  consideration.  The  de¬ 
scriptor  PRIMARY  CLASS  and  descriptors  CLASS  1 
through  CLASS  21  are  adaptations  of  the  classifica¬ 
tion  scheme  devised  by  the  TGC  to  seek  lines  of  a 
specific  phenotype  (3).  The  classification  key  is 
given  in  table  3.  Each  line  is  classified  according 
to  its  salient  phenotypic  characters  and  may 
occur  in  several  of  the  21  phenotype  classes. 
For  example,  a  line  with  yellow  and  white  varie¬ 
gation  will  fall  in  classes  2  (white),  3  (yellow),  and 
7  (variegated)  (3).  The  PRIMARY  CLASS  is  the  one 
most  significant  to  the  mutant  program,  and  it  can 
be  any  class  between  1  and  21.  Valid  descriptor 
states  for  the  descriptors  CLASS  1  through  CLASS  21 
are  PRIM,  YES,  and  NO,  respectively,  for  instances 
in  which  a  class  is  primary  for  the  line,  is  affirma¬ 
tive  but  not  primary,  or  is  negative.  SEED  SOURCE 
1  and  SEED  SOURCE  2  are  coded  descriptors  that 
indicate  where  stocks  with  the  gene  can  be  ob¬ 
tained.  TGC  REF  contains  the  number  of  the  TGC 
report  that  contains  more  information.  The  descrip¬ 
tor  LINKAGE  GROUP  shows  the  chromosome  num¬ 
ber  on  which  the  gene  is  located  (TGC  Report  No. 
23).  Descriptors  OBSOLETE  SYMBOL  1  through 
OBSOLETE  SYMBOL  4  provide  cross-reference 
to  obsolete  or  superseded  symbols.  Examples  of 
data  printouts  by  using  these  descriptors  are 
shown  in  tables  4  and  5.  The  gene  symbol  lists 
were  produced  by  the  Report  Generator  Module  of 
EXIR  (EXIR,  RPG),  and  as  genetic  analyses  continue, 
the  gene  data  bank  will  require  updating  and  pro¬ 
duction  of  updated  reports. 

In  the  information  retrieval  examples  that  fol¬ 
low,  only  a  subset  (59)  of  the  entire  gene  list  data 
bank  was  used  to  keep  the  output  volume  small 
and  simple. 

The  genes  can  be  ordered  in  various  ways,  and 
table  6  displays  the  genes  classified  by  linkage 
group.  Genes  on  this  list  were  selected  by  the  EXIR 
expression  NOT  SYMBOL,  UNKNOWN,  which  means 
all  genes  with  a  known  (not  unknown)  gene  sym¬ 
bol.  In  a  display  (EXIR  PRINT),  the  notation  "— " 
indicates  that  the  descriptor  state  is  unknown  for 
the  descriptor  under  consideration.  Therefore,  the 
last  genes  in  this  list  have  unknown  linkage 
groups. 

To  retrieve  information  on  all  stocks  in  one  re¬ 
pository,  classified  by  primary  phenotypic  classes, 
the  query  shown  in  table  7  could  be  processed. 
The  capability  for  retrieving  currently  accepted 
gene  symbols  for  former  symbols  is  illustrated  in 
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Table  1. — Rules  for  nomenclature  in  tomato  genetics  12,  4) 


1.  Chromosomes.  The  chromosomes  ore  numbered  according 
to  their  length  measured  in  pachytene.  Such  numbers  have 
already  been  applied  by  Barton  (I);  chromosome  1  is  the 
longest,  chromosome  12,  the  shortest.  In  addition  to  length, 
such  features  as  pasition  af  centromere  and  amount  and  dis¬ 
tribution  of  heterochromatin  serve  to  identify  each  chromosome. 
Short  arms  are  symbolized  by  "S,"  long  ones  by  "L”;  thus, 
“IS"  designates  the  short  arm  of  chromosome  1. 

2.  Linkage  groups.  Linkage  groups  bear  the  same  numbers 
as  their  respective  chromosomes.  As  soon  as  the  arm  location  of 
a  gene  is  known,  the  locus  numbering  shall  be  revised  to  reflect 
the  information.  The  smaller  arm  of  each  chromosome  is  desig¬ 
nated  as  the  left  arm,  and  the  zero  position  is  the  distal  or 
left  end  of  the  small  arm. 

3.  Genes.  Mutant  genes  are  designated  by  letter  symbols. 
The  mutant  name,  an  adjective  or  noun  or  a  combination  of 
both,  refers  to  the  main  diagnostic  feature  of  the  phenotype. 
The  initial  letter  of  the  symbol  should  be  the  same  as  that  of 
the  name;  additional  appropriate  letters  are  added  as  necessary 
to  distinguish  it  from  other  symbols  already  in  use. 

The  genes  c,  r,  s,  and  y  do  not  conform  to  this  rule,  their 
symbols  having  unfortunately  been  derived  from  the  normol 
instead  of  the  mutant  condition.  Since  these  symbols  have  been 
in  widespread  use  for  over  40  years,  they  should  be  retained, 
but  all  others  should  conform  to  these  rules. 

After  obtaining  reasonable  evidence  for  the  existence  of  a 
new  gene  for  which  the  phenotype  can  be  distinguished  reason¬ 
ably  well  in  some  or  all  genotypic  milieux,  the  discoverer 
should  select  on  oppropriate  name  and  symbol,  taking  care 
not  to  use  any  already  reported. 

Special  effort  should  be  expended  to  find  the  appropriate 
names  for  new  mutants.  Latin  names  or  their  English  derivatives 
are  recommended  for  their  international  meaning. 

The  symbol  should  consist  of  the  minimum  letters  or  numbers 
necessary  to  distinguish  it  from  all  other  existing  symbols.  Al¬ 
though  lengthy  symbols  may  have  reference  value  to  the  dis¬ 
coverer,  they  are  cumbersome  for  genetic  notation  and  become 
a  hindrance  to  other  workers,  who  often  make  greater  use  of 
the  mutants  than  the  discoverer. 

4.  Alleles.  Dominance  or  recessiveness  of  a  mutant  gene  is 
indicated  by  comparison  with  a  "standard"  or  "normal"  type. 
The  variety  Marglobe  is  proposed  as  this  normal  type  since  it  is 
widely  grown  and  is  typical  of  the  general  concept  of  normal 
tomato  morphology. 

A  mutant  gene  dominant  to  the  normal  type  is  written  with 
the  initial  letter  of  the  name  and  symbol  capitalized,  while  a 
recessive  is  written  with  all  letters  in  lower  case.  The  normal 
allele  of  a  mutant  gene  is  written  with  the  symbol  of  the  mutant 
gene  followed  by  the  superscript  Thus  the  normal  allele 

of  sp  is  sp*  and  of  I  is  A  dominant  allele  appearing  later  at 
the  sp  locus  would  be  designated  sp”.  Additional  alleles  at  the 
same  locus  are  designated  by  appropriate  letter  or  number 
superscripts;  thus  for  the  d  locus,  the  following  alleles  are 
known:  d  d*,  d®'^,  and  d*.  For  the  first  member  of  a  numbered 
series  of  alleles,  which  is  the  preferoble  form  of  nototion,  the 
“1"  shall  be  understood  but  not  used.  When  it  is  clear  in  the 
text  which  gene  is  concerned,  the  normal  allele  may  be  desig¬ 
nated  simply  by  the  symbol. 

5.  Indistinguishable  alleles  of  independent  origin  (supposed 
reoccurrences).  We  recommend  that  these  not  be  given  unique 
symbols.  If  they  are,  however,  they  shall  be  designated  by  the 


existing  gene  symbol,  which  shall  be  enclosed  in  parentheses, 
shall  consist  of  an  Arabic  numeral  corresponding  to  the  par¬ 
ticular  unique  abbreviation  derived  from  the  discoverer's  name, 
this  abbreviation  to  be  assigned  by  the  Gene  List  Committee  to 
the  discoverer  and  used  by  him  for  designating  reoccurrences 
of  any  locus.  For  example,  ag***'*  and  ag****’  (formerly  ag’ 
and  ag*  respectively),  the  first  and  second  reoccurrences  of  ag 
found  by  Knowles.  If  the  gene  symbol  already  has  a  superscript, 
this  shall  precede  the  appended  parenthetical  superscript. 

6.  Mimics.  Mutants  that  have  different  loci  and  are  indis¬ 
tinguishable  phenotypically  from  each  other  are  known  as 
mimics.  Members  of  a  single  mimic  series  are  preferably  desig¬ 
nated  by  different  names  and  symbols  (for  example,  u  for  uni¬ 
form,  ug  for  uniform  gray-green).  If  a  numbered  series  is  used, 
the  same  name  and  base  symbol  are  followed  by  a  hyphen  and 
a  distinguishing  number  of  the  same  type  level.  Far  the  first 
member  of  such  a  mimic  series,  the  "1"  shall  be  understood 
but  not  used  (for  example,  ms,  ms-2,  and  so  forth). 

7.  Translocations  are  designated  by  the  symbol  "T."  The 
chromosomes  involved  are  designated  by  their  respective  num¬ 
bers.  To  distinguish  between  translocations  involving  the  some 
chromosomes,  lowercase  letters  are  used  following  the  chromo¬ 
some  numbers,  thus  T(l-2)a,  T(l-2)b,  and  so  forth. 

8.  Inversions  are  designated  by  the  symbol  "In",  while  the 
chromosome  in  which  the  inversion  occurs  is  indicated  by  its 
respective  number.  Lowercase  letters  are  used  to  distinguish 
different  inversions  on  the  same  chromosome,  thus  ln(1)a. 
In  ( 1 )  b,  and  so  forth. 

9.  Deficiencies  are  designated  by  the  symbol  "Df"  and  are 
distinguished  in  the  same  manner  as  inversions  (rule  8). 

10.  Aneuploids.  Monosomies  and  primary  trisomics  are  desig¬ 
nated  according  to  the  missing  or  extra  chromosome;  thus 
"trlplo-1"  refers  to  the  primary  trisomic  of  chromosome  1,  and 
"haplo-12"  refers  to  the  monosomic  for  chromosome  12. 

For  aneuploids  possessing  whole  orm  interchanges,  the  chromo¬ 
somes  are  symbolized  according  to  their  component  arms,  a 
connective  dot  representing  the  centromere;  thus  "1S-12L”  speci¬ 
fies  an  interchange  between  the  short  arm  of  chromosome  1  and 
the  lang  arm  of  chromosome  1 2.  According  to  this  system, 
2n-|-10L'10L  would  be  an  example  of  a  secondary  trisomic, 
2n-f-5S'7L  an  example  of  a  tertiary  trisomic,  2n-)-8L  an  exam¬ 
ple  of  a  telotrisomic,  and  2n  —  3S'3L+3S‘3L  an  example  of  a 
compensating  trisomic  in  which  a  normal  chromosome  is  compen¬ 
sated  by  isochromosomes  of  its  short  arm  and  long  arm. 

11.  To  distinguish  between  gene  symbols  and  symbols  of  the 
chromosome  aberrations,  the  former  are  italicized. 

12.  Since  chromosomes  of  all  investigated  species  of  Lycopersicon 
appear  to  be  almost  completely  homologous,  we  suggest  that 
the  same  symbolization  apply  to  the  entire  genus.  We  also  pro¬ 
pose  that  the  complete  gene  symbols  not'  be  duplicated  among 
the  species  unless  the  genes  in  question  are  known  to  be 
identical,  and  that  the  key  letters  of  the  symbols  not  be  dupli¬ 
cated  unless  the  genes  are  alleles  or  mimics.  Species  alleles 
can  be  designated  by  a  superscript  to  indicate  the  species,  for 
example,  a*’  for  a  hypothetical  allele  in  L.  hirsutum. 

13.  Much  confusion  will  be  avoided  by  clearing  names  and 
symbols  with  the  chairman  of  either  the  Gene  List  Committee 
or  Coordinating  Committee  of  the  TGC  at  the  earliest  opportunity. 
Such  action  will  settle  matters  of  priority  and  assist  the  investiga¬ 
tor  by  assigning  a  reasonably  permanent  symbol  to  his  mutant. 
Permanence  cannot  be  guaranteed,  however,  because  unknown 

Table  1  continued  on  next  page. 
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Table  1. — Rules  for  nomenclature  in  tomato  genetics  (2,  4) — Continued 


earlier  or  simultaneous  publication  might  establish  priority. 

14.  Modifiers.  Modifiers  (mo  or  Mo)  of  specific  genes  shall 
be  symbolized  by  the  appropriate  symbol  followed  by  the  sym¬ 
bol  of  the  gene  modified  enclosed  in  parentheses.  Subsequent 
nonallelic  modifiers  shell  be  denoted  by  adding  a  number  suffix 
to  the  combined  symbol.  For  example,  the  two  dominant  modi¬ 


fiers  of  I  are  Mod)  and  Mod)  2.  When  the  action  of  the  modi¬ 
fying  gene  is  known  to  be  suppression,  use  a  suppressor  symbol 
(Sup);  likewise,  use  an  enhancer  symbol  (Enfi)  for  genes  of 
enhancing  action.  In  either  case,  the  gene  whose  expression  is 
modified  shall  be  symbolized  in  parentheses  as  specified  for 
modifiers,  thus,  En)i(l),  Sup(d2. 


table  8  by  using  the  currently  accepted  symbol 
-ah-  for  the  mutant  formerly  known  as  ao. 

As  a  further  example,  we  asked  for  a  list  of 
mutants  that  affect  plant  habit  and  size  and  either 
leaf  necrosis  or  leaf  form  and  size.  The  retrieved 
information  shown  in  table  9  is  ordered  by  SYM¬ 
BOL,  and  other  information  includes  the  descrip¬ 
tors  NAME,  PRIMARY  CLASS,  and  SEED  SOURCE  1. 

Table  2. — Modification  to  the  rules  for  nomencla¬ 
ture  in  tomato  genetics 

( Refer  to  table  1  ) 

1.  l4o  change 

2.  )4o  change 

3.  No  change 

4.  Superscripts  are  not  available.  Either  the  symbol  or 

has  been  used  to  indicate  that  a  superscript  follows.  The 
following  allelic  series  illustrates  this  modification: 

TGC  Computer-processible 

d  d 

d*  d  1  X  or  d  I  X 

d®"^  d  1  cr  or  d  '  cr 

d®  d  [  -}-  or  d  I  -p 

5.  As  noted  in  number  4  above,  an  alternate  notion  for  super¬ 

scripts  has  been  employed.  One  further  modification  af¬ 
fects  this  rule.  The  symbols  and  have  been 

substituted  for  left  and  right  parentheses  respectively. 
The  following  series  of  reoccurrences  illustrates  this  modi¬ 
fication: 

TGC  Computer-processible 

ag***'*  ag  |  <1K>  or  ag  j  <1K> 

ag'****  ag  j  <2K>  or  ag  |  <2K> 

6.  No  change 

7.  As  noted  in  number  5  above,  an  alternate  notation  for 

parentheses  has  been  employed.  Thus,  T(l-2)a  becomes 
T<l-2>a. 

8.  Same  as  number  7. 

9.  Same  as  number  7. 

10.  No  change 

11.  Italics  are  not  available.  The  distinction  between  gene  sym¬ 

bols  and  symbols  of  the  chromosomal  aberrations  should 
be  derived  from  context. 

12.  For  species  alleles,  the  alternate  superscript  notation  de¬ 

scribed  in  number  4  has  been  employed. 

1 3.  No  change 

14.  Same  as  number  7. 


World  Collection  of  the  Genus  Lycopersicon 

Descriptors  in  the  Lycopersicon  data  bank  that 
have  been  chosen  demonstrate  how  various  traits 
can  be  combined  to  selectively  retrieve  informa¬ 
tion  on  lines  that  constitute  any  desired  subset  of 
the  germplasm  bank.  Identification  descriptors  in¬ 
clude  PI  NUMBER  and  SOURCE,  and  12  genes  were 
chosen  to  describe  plant  genotypes.  The  genes 
chosen  were  dictated  by  their  usefulness  for  il¬ 
lustrative  purposes.  The  chosen  loci  were  B,  c,  I, 
Mi,  ms-9,  ms-17,  pe,  r,  sp,  Sw,  Ve,  and  Wo.  The 
descriptor  states  that  each  locus  can  assume  are 
the  gene  symbols  for  the  alleles  identified  at  that 
locus.  Thus,  for  the  descriptor  Wo,  the  valid  de¬ 
scriptor  states  are  Wo,  Wo|m,  Wojv  and  -f-  (an 
abbreviated  representation  of  Wo 


Table  3. — Key  to  mutant  classes  (31 

class  Description 

1  Anthocyanin  modification:  intensification,  reduction,  elimination 


2 

Chlorophyll  deficiency:  white  or  whitish 

3 

Chlorophyll  deficiency:  yellow  or  yellowish 

4 

Chlorophyll  deficiency:  light  grey  or  dull  grey 

5 

Chlorophyll  deficiency:  yellow-green 

6 

Chlorophyll  deficiency:  virescent  (localized  at 

growing  point). 

7 

Irregular  variegation,  flecking  or  striping 

8 

Leaf  necrosis 

9 

Hair  modification:  augmentation,  reduction, 
nation 

distortion,  elimi- 

10 

Leaf  form  and  size 

11 

Plant  habit  and  size 

12 

Flower  form  and  color 

13 

Inflorescence  exclusive  of  12 

14 

Sterility:  any  condition  leading  to  partial  i 
fruitfulness 

or  complete  un¬ 

15 

Fruit  form  and  surface  texture 

16 

Fruit  color  and  flavor 

17 

Disease  resistance 

18 

Miscellaneous  characters:  earliness,  wilting, 
and  so  forth 

root  mutation. 

19 

Seed 

20 

Foliage  color:  dark  green 

21 

Foliage  color,  miscellaneous:  olive,  brown. 

blue-green 

4 


Table  4. — List  of  tomato  mutant  genes 
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Table  5. — List  of  mutant  tomato  genes 
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Descriptors  for  vine  and  foliage  characters  include 
VINE  SIZE,  GROWTH,  HABIT,  and  LEAFLET  SIZE. 
Fruit  character  descriptors  include  CONCENTRIC 
CRACKING,  RADIAL  CRACKING,  MINIMUM  FRUIT 
SIZE,  MAXIMUM  FRUIT  SIZE,  SHAPE,  SET,  MATUR¬ 
ITY,  FLESH  INTENSITY,  MIN  NO  LOCULES,  MAX  NO 
LOCULES,  and  FASCIATION. 

Tables  10  and  11  display  data  for  some  acces¬ 
sions  in  the  World  Collection.  These  tables  were 
produced  by  the  Report  Generator  Module  of  EXIR 
(EXIR  RPG),  and  they  show  the  types  of  report  that 
can  be  produced  by  a  computerized  IR  system. 


Such  lists  often  are  useful,  but  the  greatest  ability 
of  the  IR  system  is  its  capacity  to  retrieve  informa¬ 
tion  only  on  accessions  of  interest.  For  example,  if 
one  wants  the  sp  allele  occuring  at  the  sp  locus 
and  the  normal  (-}-)  allele  at  the  I  locus  in  acces¬ 
sions  that  are  early  to  medium  maturing  (1  to  3),  in¬ 
formation  on  the  selected  accessions  can  be  dis¬ 
played.  Table  12  shows  the  descriptors  MATURITY, 
MAX  NO  LOCULES,  GROWTH  HABIT,  and  PI  NUM¬ 
BER  for  these  accessions.  Note  that  the  output  is 
ordered  hierarchically,  first  by  MAX  NO  LOCULES 
and  then  by  GROWTH  HABIT,  and  PI  NUMBER, 
respectively. 


Table  6. — ID:  Sample  exir  query  on  the  tomato  gene  list  data  bank 


PRINT:  LINKAGE  GROUP .( SYMBOL ,  NAME,  PRIMARY  CLASS,  OBSOLETE  1  ) 

FOR  GENES  WITH  NOT  SYMBOL , UNKNOWN* 

NO.  OF  ITEMS  IN  QUERY  RESPONSE  =  59 

NO.  OF  ITEMS  IN  THE  DATA  BANK  =  59 

PERCENTAGE  OF  RESPONSE/TOTAL  DATA  BANK  =100.00 
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adpressa 
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ao 
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anthocyanin  gainer 
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1  1 
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anthocyaninless 
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12 
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12 

alb 

albescent 

7 
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aud 

auro i d 

3 
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apocarpous 

1  5 
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acroxantha 
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accumbens-2 
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adusta-3 
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aegpota 
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artthocyanin  galner[<1K> 
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Table  7. — ID:  Sample  exir  query  on  the  tomato  gene  list  data 

bank 


print:  primary  class,  symbol 

FOR  GENES  WITH  SEED  SOURCE  1,R  OR  SEED  SOURCE  2,R* 

NO.  OF  ITEMS  IN  QUERY  RESPONSE  =  13 

NO.  OF  ITEMS  IN  THE  DATA  BANK  =  59 

PERCENTAGE  OF  RESPONSE/TOTAL  DATA  BANK  «  22.03 

1 

ag 

agl<1K> 
ag I <1 S> 
ag 1 <2K> 
ah 
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awj<1K> 
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af  1 
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aud 
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aeg 

12 

ap 

16 

at 

aut 


Table  8. — ID:  Sample  exir  query  on  the  tomato  gene  list 
data  bank 

PRINT:  SYMBOL  FOR  THE  GENE  WITH  OBSOLETE  1 ,ao* 

NO.  OF  ITEMS  IN  QUERY  RESPONSE  *  1 

NO.  OF  ITEMS  IN  THE  DATA  BANK  »  59 

PERCENTAGE  OF  RESPONSE/TOTAL  DATA  BANK  »  1.69 

ah 


Table  9. — ID:  Sample  exir  query  on  the  tomato  gene  list  data 

bank 


PRINT:  (SYMBOL,  NAME,  PRIMARY  CLASS,  SEED  SOURCE  1  ) 
FOR  GENES  WITH  CLASS  11, PRIM  OR  YES 
AND  (CLASS  8, PRIM  OR  YES  OR  CLASS  10, PRIM  OR  YES)* 

NO.  OF  ITEMS  IN  QUERY  RESPONSE  =  13 

NO.  OF  ITEMS  IN  THE  DATA  BANK  =  59 

PERCENTAGE  OF  RESPONSE/TOTAL  DATA  BANK  *  22.03 

acr  acroxantha  4  S 

acu-2  accumbens-2  10  S 

adp  adpressa  11  S 

adp-2  adpressa-2  11  S 
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af e  af ert il i s  1 4  S 

ana  angusta  10  S 

ang-2  angust  i  foH  a-2  4  S 

ant  aurairtia  4  S 

ap1  applanata  10  S 

atn  attenuate  5  S 
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Table  1 1 . — World  collection  of  the  genus  Lycopersicon 
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Another  example  where  the  search  was  for  ac¬ 
cessions  with  medium  to  large  vine  size  (3  to  5) 
and  any  allele  other  than  the  normal  type  at  the 
Mi  locus  is  shown  in  table  13.  The  user  in  this  case 
chose  to  order  the  output  by  PI  NUMBER. 


Table  14  displays  a  query  for  fruit  characters 
and  genotypic  information  on  accessions  with  the 
r  allele  at  the  r  locus  and  any  allele  other  than  the 
normal  type  at  either  the  I  or  Sw  locus.  The  output 
is  ordered  by  MINIMUM  FRUIT  SIZE,  MAXIMUM 
FRUIT  SIZE,  SHAPE,  SET,  and  FASCIATION,  re¬ 
spectively. 


Table  12. — ID:  Sample  exit  query  on  a  subset  of  the  world  Lycopersicon 

collection  data  bank 


print:  MATURITY, (MAX  NO  LOCU LES , GROWTH , HABIT , PI  NUMBER) 

FOR  ACCESSIONS  WITH  sp , sp  AND  I,+  AND  MATUR ITY , FROM  1  TO  3* 

NO.  OF  ITEMS  IN  QUERY  RESPONSE  =  7 

NO.  OF  ITEMS  IN  THE  DATA  BANK  =  B3 
PERCENTAGE  OF  RESPONSE/TOTAL  DATA  BANK  •  8.43 
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3 

— 

ERECT 

205046 

5 

DET 

INTER 

273444 

MANY 

— — 

INTER 

193421 

3 

INTER 

224577 

3 

— 

INTER 

254657 

5 

— 

ERECT 

224572 

MANY 

— 

INTER 

19341 1 

Table  13. — ID:  Sample  exir  query  on  a  subset  of  the  world  Lycopersicon 

collection  data  bank 


PRINT:  (PI  NUMBER, SOURCE, Ml .VINE  SIZE  ) 

FOR  ACCESSIONS  WITH  VINE  SIZE,  FROM  3  TO  5  AND  NOT  Mi ,+• 

NO.  OF  ITEMS  IN  QUERY  RESPONSE  =  2 

NO.  OF  ITEMS  IN  THE  DATA  BANK  =  83 

PERCENTAGE  OF  RESPONSE/TOTAL  DATA  BANK  =  2.41 

265955  GILBERT  HES  6360  ANAHU  Mi  4 

265957  GILBERT  HES  6321  <6147>  Mi  3 


Table  14. — ID:  Sample  exir  query  on  a  subset  of  the  world  Lycopersicon  collection  data  bank 


print:  (MINIMUM  FRUIT  SIZE,  MAXIMUM  FRUIT  S I ZE , SHAPE , SET , FASC I ATION , 1 , Sw, P I  NUMBER) 
FDR  ACCESSIONS  WITH  P,r  AND  (NOT  I,+  OR  NOT  Sw,+)* 

NO.  OF  ITEMS  IN  QUERY  RESPONSE  =  5 

NO.  OF  ITEMS  IN  THE  DATA  BANK  =  83 

PERCENTAGE  OF  RESPONSE/TOTAL  DATA  BANK  «  6.02 
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Conclusions  and  Future  Research 


X 


The  computer-processible  nomenclature  described 
in  this  paper  can  serve  the  needs  of  the  com¬ 
munity  of  scientists  involved  in  research  in  tomato 
genetics.  The  nomenclatorial  rules  of  the  Tomato 
Genetics  Cooperative  were  used  as  the  basis  for 
the  computer-processible  nomenclature,  and  the 
symbols  chosen  proved  very  effective  in  a  com¬ 
puterized  information  storage  and  retrieval  sys¬ 
tem.  Display  of  information  as  well  as  selective 
retrieval  and  updating  of  information  was  greatly 
facilitated. 

The  gene  symbol  nomenclature  shown  for  to¬ 
matoes  is  applicable  to  many  crops  as  a  vehicle 
for  effective  computer-assisted  information  man¬ 
agement. 

We  discovered  several  inconsistencies  and  am¬ 
biguities  during  work  with  the  nomenclature.  Fur¬ 
ther  development  of  the  nomenclature  will  seek 
to  resolve  theses  inconsistencies.  Future  research 
will  include  investigation  into  the  nomenclatorial 
systems  being  used  in  other  crops  such  as  oats  and 
maize.  The  end  product  from  this  research  will  be 
a  consistent  system  of  nomenclature  for  gene  sym¬ 
bols  that  is  computer-processible,  minimizes  am¬ 
biguity,  and  is  widely  acceptable  to  the  entire 
genetic  research  community. 
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